Skip to content

Added functionality for PCA analysis together with related unit tests.#574

Closed
ashkurti wants to merge 3 commits into
MDAnalysis:developfrom
ashkurti:pca_analysis
Closed

Added functionality for PCA analysis together with related unit tests.#574
ashkurti wants to merge 3 commits into
MDAnalysis:developfrom
ashkurti:pca_analysis

Conversation

@ashkurti

@ashkurti ashkurti commented Dec 9, 2015

Copy link
Copy Markdown

The core functionality is in pcz.py (added at mdanalysis/package/analysis). The file mdanalysis/testsuite/MDAnalysisTests/analysis/test_pca.py is the “nose” testing file that the MDAnalysis distribution requires while there are other three data files that are used for the testing, namely: mdanalysis/testsuite/MDAnalysisTests/data/test_evecs.npy, mdanalysis/testsuite/MDAnalysisTests/data/test_evals.npy, mdanalysis/testsuite/MDAnalysisTests/data/test_scores.npy.

Further information and comments can be found among the comments of the pcz.py module.

@richardjgowers

Copy link
Copy Markdown
Member

Hey

Looks good.

The fail on the Full build is because of conda/scipy/travis not playing well together, I'm looking into it.

Because scipy is an optional dependency you need to decorate any tests that use scipy to allow them to fail if scipy isn't present. See here for example:

https://github.com/MDAnalysis/mdanalysis/blob/develop/testsuite/MDAnalysisTests/analysis/test_distances.py#L27

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is PEP8 convention to line up the the opening brace

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, I have lined it up in this way:

    def __init__(self, universe, atom_selection=None, start=None, stop=None,
    step=None, reference_coordinates=None, fitting_tolerance = 0.01,
    max_iterations = 10, captured_variance = None
    ):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should line up on the opening brace

    def __init__(self, universe, atom_selection=None, start=None, stop=None,
                        step=None, reference_coordinates=None, fitting_tolerance = 0.01,
                        max_iterations = 10, captured_variance = None ):

To keep you from manual editing everything you can use yapf to autoformat the code

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yapf looks very good for this, I have used it on all my .py modules.

@kain88-de

Copy link
Copy Markdown
Member

Thanks for implementing a PCA. I thought I had to do that myself for a project next year.

@ashkurti

Copy link
Copy Markdown
Author

@richardjgowers
I think that scipy is mandatory for this functionality since it stands at the core of performing the algebra calucaltions for the PCA. If we say that we skip the tests if scipy is not available, then, PCA analysis will not be available either.

@jbarnoud

Copy link
Copy Markdown
Contributor

@ashkurti You are right. Yet, the analysis part is an optional component of MDAnalysis; scipy, that is only used in the analysis part, is therefore an optional requirement. The idea is that the tests can run with the minimal install of MDAnalysis, but only for the parts that can work with that minimal install. The PCA analysis will still be tested if scipy is installed.

@orbeckst orbeckst added this to the 0.13 milestone Dec 10, 2015
@orbeckst

Copy link
Copy Markdown
Member

Add information to the CHANGELOG (for the 0.13 release – I think we'll manage to get it in). Follow the example of what's already there. Append your GitHub handle ashkurti to the contributors list at the top of the 0.13 entry.

Also add authorship information: add your name to

  • AUTHORS
  • doc/sphinx/source/conf.py (look for the authors variable)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orbeckst

Copy link
Copy Markdown
Member

If you rebase against the current develop, i.e.

git rebase develop pca_analysis

and then force-update this branch with

git push --force

then the unit tests should pass (because #576 was fixed recently). The QuantfiedCode tests are less important at the moment (see #583 for discussion if you're interested).

However, please address the review comments. As in academic peer review: Either make the changes or explain in a discussion why you do not want to change something.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+A PCA module for MDAnalysis
+==========================================================================

should be
+A PCA module for MDAnalysis
+=======================

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have replaced ========================================================================== with =======================

@kain88-de

Copy link
Copy Markdown
Member

@ashkurti I had some time today to try your PCA module. I tried to reproduce results on random data.
I uploaded everything to this gist. I have tried to reproduce the results from this paper, unfortunately I couldn't. My guess is that something in your implementation is wrong but it can also be that I used it wrong.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed that it has problems when atom_selection is None. I had to set it to all for the PCA class to be successfully initialized.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have set the default value of atom_selection to all now.

@ashkurti

Copy link
Copy Markdown
Author

@orbeckst and all, thanks for your comments and instructions. I will do the further integration according to the instructions shortly as soon as I find any free time slots!

@ashkurti

Copy link
Copy Markdown
Author

@richardjgowers Where does scipy get used at https://github.com/MDAnalysis/mdanalysis/blob/develop/testsuite/MDAnalysisTests/analysis/test_distances.py#L27 if the scipy module is found (except for the decorator).

@kain88-de

Copy link
Copy Markdown
Member

@ashkurti scipy is used once we import MDAnalysis.analysis.distances. We need it's support for sparse matrices. See here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants